(i2) United States Patent 

Stephens et al. 



US006345040B1 

(10) Patent No.: US 6,345,040 Bl 
(45) Date of Patent: Feb. 5, 2002 



(54) SCALABLE SCHEDULED CELL SWITCH 
AND METHOD FOR SWITCHING 

(75) Inventors: Donpaul C. Stephens, Pittsburgh, PA 
(US); Jon C, R. Bennett, Sudbury, MA 
(US) 

(73) Assignee: Marconi Communications, Inc., 
Warrendale, PA (US) 

( * ) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 0 days. 

(21) Appl. No.: 09/126,475 

(22) Filed: Jul. 30, 1998 

(51) ,Int. CI. 7 G06R 31/08 

(52) U.S. CI 370/232; 370/413 

(58) Field of Search 370/229, 230, 

370/231, 232, 233, 234, 235, 236, 237, 
238, 389, 381, 392, 394, 395, 396, 397, 
412, 413, 414, 415, 416, 417, 418 

(56) References Cited 

U.S. PATENT DOCUMENTS 



5,592,476 A *■ 1/1997 Calamvokis et al 370/392 

5,689,508 A * 11/1997 Lyles 370/391 

5,956,322 A * 9/1999 Charny 370/232 

5,978,359 A * 11/1999 Caldara et al, 370/236 

5,982,776 A * 11/1999 Manning et al 370/414 



6,081,507 A * 6/2000 Chao et al 370/235 

* cited by examiner 

Primary Examiner — Chau Nguyen 

Assistant Examiner — Inder Pal Mehra 

(74) Attorney, Agent, or Firm — Ansel M. Schwartz 

(57) ABSTRACT 

A telecommunications switch. The switch includes a first 
output port mechanism through which sessions having cells 
are sent at a total session rate to a network. The switch 
includes a first input port mechanism through which sessions 
are received from the network. The first input port mecha- 
nism is connected to the first output port mechanism. The 
first input port mechanism has a first guaranteed session rate, 
The switch includes a second input port mechanism through 
which sessions are received from the network. The second 
input port mechanism is connected to the first output port 
mechanism. The second input port mechanism has a second 
guaranteed session rate, the sum of all guaranteed session 
rates are less than or equal to the total session rate. The 
switch includes a first scheduler connected to the first and 
second input port mechanisms and to the first output port 
mechanism for scheduling sessions of the input port mecha- 
nisms for service. The switch includes a server for providing 
service to sessions of the input port mechanisms. The server 
is connected to the first and second input port mechanisms 
and the first output port mechanism, A method for switching 
sessions having cells. . 

29 Claims, 2 Drawing Sheets 
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SCALABLE SCHEDULED CELL SWITCH 
AND METHOD FOR SWITCHING 

FIELD OF THE INVENTION 

The present invention is related to a scheduler. More 5 
specifically, the present invention is related to the design of 
a scheduler for a number of outputs in a distributed fashion. 

BACKGROUND OF THE INVENTION 

ATM is currently viewed as the technology behind future 30 
integrated services networks. Within these networks, it is 
desirable that individual flows (VCs) be able to receive a 
guaranteed service rate through the network. While mecha- 
nisms have been developed that enable this to be performed 
in an output buffered switch, these are known not to scale. 15 
For an integrated services network to be cost-effective, there 
is a need for these services to be provided at low cost in a 
large scale switch. The present invention provides an effi- 
cient approach for providing bandwidth guarantees in a 
scalable switch. 20 

Currently, ATM switches are primarily constructed as 
output buffered or shared memory based systems due to the 
simplicity in making non -blocking devices. Larger scale (10 
Gbps) ATM switches are presently constructed using a ^ 
layered set output buffers, one that accepts traffic at the 
aggregate rate of the device, another that accepts a slower 
rate and attempts to divide the bandwidth among a set of 
ports managed by the controller for this secondary memory. 

Since the bandwidth managed for these ports is a valuable 30 
resource subject to contention, schedulers are used to order 
when connections are serviced for a given port. These 
schedulers are generally placed on the aforementioned sec- 
ondary memory. The schedulers on these secondary memo- 
ries attempt to provide service guarantees for egress traffic 35 
on the ports managed by it. These service guarantees are 
based largely on the assumption that the main point of 
contention among egress flows is at the secondary memory. 
In actuality, only a fraction of the system bandwidth is 
supplied to the secondary buffering point. This and the fact 4Q 
that multiple ports are commonly associated with these units 
leads to them often being referenced as multiplexors/ 
demuliplexors in the literature. 

As systems are constructed of increasingly larger scale, 
the fraction of total system bandwidth that can be provided 45 
to a single multiplexor decreases asymptotically, thus reduc- 
ing the correctness of their model. When feedback flow 
control, such as ABR, is performed in the multiplexors of 
such large scale devices with incorrect system wide 
information, it is easy to see that the system can place itself 50 
into perennial instability. 

While operational stability is of concern for a user of 
equipment, the manufacturing cost of goods is of primary 
concern for the company developing the switch (lower cost 
implies higher profit margins for a given device cost). The 55 
physical area, power, cooling, and cost of output buffered 
switches is well known to be an N 2 problem, i.e., as the 
number of ports grow, the sum of the input bandwidth for N 
ports must be able to be buffered at each of N outputs. The 
cost/performance of memory technology exists as a step 60 
function. That is, for a desired amount of bandwidth, the cost 
remains relatively stable or increases with some rate during 
some periods with significant jumps in some locations. 
While increased width may be used to decrease the band- 
width required per-part, the systems are no longer able to 65 
pipeline accesses internally. In the limit, a single cell can be 
stored on one address in memory (53 Byte+overhead wide 
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memory). At very high speeds, only SRAM can sustain the 
speed of accesses required. SRAM devices require more 
transistors to implement a memory of a given size that 
DRAM, this increases the cost of goods for these devices. 
The board area, power, and cooling for these SRAM devices 
(which grow with N 2 ) is known to limit the scalability of 
output buffered switches. 

In systems where connectivity alone is desired, many 
academic solutions have centered around constructs origi- 
nally designed to perform circuit switching such as banyan, 
batcher banyan, and even feedback based networks includ- 
ing the aforementioned as components. These switches are 
often simulated under highly optimistic assumptions of 
uniform traffic distributions and lightly loaded networks. 
Real data networks contain servers for file systems, web 
pages, and additional services; these functions provide a 
valuable resource onto themselves and are cause for the 
output distribution to be asymmetric. The global Internet 
utilizes a core set of protocols, with TCP/IP being the 
foremost often used pair. The TCP stacks on end systems 
attempt to keep traffic in the network so that whenever 
bandwidth becomes available, it may be used by the appli- 
cations. 

Being based on circuit switching constructs, their key 
metric is blocking probability (an output link remaining idle 
when cells are enqueued in the system for it). However, even 
under the optimistic assumptions used by their designers, 
analysis often shows perceptible blocking probability 
(which is zero for output buffered switches). These switches 
are also centralized in nature, i.e., the entire switch core is 
located on a chip, or set of chips that are co-located on a 
board. This impacts the ability to construct fault tolerant 
devices. Network devices, including switches and routers, 
within such networks are thus often placed under high loads. 
Some of these switches would restrict or drop traffic for 
uncongested ports if other ports became congested for a 
small period of time (10s of cells). It is for these reasons that 
such switches have not found commercial success. 

These circuit switch based devices generally had buffers 
placed at their inputs. Extensive analysis has been done on 
the tradeoffs of input versus output queued switches. In a 
non-blocking input buffered switch with FIFO queuing, 
when the cell at the head of the queue is blocked due to 
contention for a given output port, all cells behind it within 
the queue are prevented from being transmitted, even when 
their output port is idle. This situation is called head-of-line 
(HOL) blocking. This is a well known problem, that in the 
presence of uniformly distributed traffic across all ports 
results in limiting switch throughput to 58% of the band- 
width of the connecting links [M. Karol, M. Hluchyj, and S. 
Morgan. "Input Versus Output Queuing on a Space-Division 
Packet Switch." IEEE Transactions on Communications, 
35(12): 1347-1356, December 1987.] In fact, throughput can 
fall as low as that of a single link [S. Li, "Theory of Periodic 
Contention and its Application to Packet Switching." In 
Proceedings of IEEE INFOCOM '88, 320-325, March 
1988.]. 

While having poor throughput performance, avoiding 
buffering at the aggregate switch rate has encouraged further 
study in this field. [T. Anderson, S. Owicki, J. Saxe, and C. 
Thacker. "High Speed Switch Scheduling for Local Area 
Networks."] separates data forwarding from system 
scheduling, and utilizes per connection queues at the inputs, 
while using a crossbar with a centralized switch scheduler. 
Fixed size frames are used to support guaranteed traffic. 
While this solves the blocking problem of earlier input 
queued switches, many limitations are present. Its guaran- 
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tees are rather course grain. A crossbar is actually not an The present invention pertains to a method for building a 

expensive mechanism in high speed switches, as the number scheduler for a large scale switch. In particular, this inven- 

of internal ports is low. The key problem is the centralized tion describes how to provide bandwidth and delay bounds 

scheduler. While satisfactory for a local area switch of its in a buffered crossbar switch. In such a switch, buffers are 

time, this leads to an unacceptable failure point for a large 5 maintained internal to the switch for each pair of input - 

scale enterprise or WAN switch required in the next few output nodes. When a cell is sent from the switch core to an* 

years. output, a credit is returned to the input that had sent the cell 

Noting that the performance of large scale systems is into the switch core. An input may send a cell to any output 

limited by the bandwidth on the internal links, [F. M. for wh i c h il has a credit. While prior art mechanisms have 

Chiussi, Y. Xia, and V. P. Kumar. "Backpressure in Shared- 10 employed these techniques to reduce the complexity of j 

Memory-Based ATM Switches under Multiplexed Bursty switch design, they were unable to provide bandwidth or 1 

Sources", In Proceedings of IEEE INFOCOM '96] explored dc l av guarantees. This invention utilizes a scheduled hier- J 

a switch using buffers at the inputs, along the outputs, and arc hy wi&in the crossbar switch and at the input nodes to I 

within the switch core. While this was shown to yield select the order in which cells may pass through the switch / 

dramatic improvements in buffering requirements, no meth- 15 COTC - Separate matrix buffer pairs are maintained at each' 

ods were proposed for providing bandwidth guarantees. node for a11 source nodes within its section for destinations 

Whatisneededisamechanismforprovidingawidearray at *f lf ^ { ™& s in ad i oin [ n g ^tions These buffers 

of fine grain connection guarantees in a large scale network- e ™ ble scheduling decisions to be made with minimal local) 

ing device at a moderate cost. It is among the objects of the information are small enough to fit onchip, and utilize a , 

invention to overcome the aforementioned limitations of the 20 mechanism to denote when buffers are available, , 

prior art by providing a method and apparatus for construct- Credits are eventually returned to the source section of the ; 

ing a distributed scheduler for a cell switched network. nodes ( wluch P rovldc data mt0 . lhe matnx >' J* 1 ? ™l rcc 

° section contains a per connection input queue which buffers 

SUMMARY OF THE INVENTION au< traffic arriving on its input port(s). Cells are scheduled for 

25 destination nodes (the output port interface section) within 

The present invention pertains to a telecommunications the switch that have buffer credits based on the relative 

switch. The switch comprises a first output port mechanism needs of these destination nodes. This enables a very large 

through which sessions having cells are sent at a total switch to be constructed that provides per-flow guarantees in 

session rate to a network. The switch comprises a first input a distributed manner. Prior art schedulers are assumed to be 

port mechanism through which sessions are received from 30 output buffered, prior art large scale switches only provide \ 

the network. The first input port mechanism is connected to connectivity. J 

the first output port mechanism. The first input port mecha- ^ 

nism has a first guaranteed session rate. The switch com- BRIEF DESCRIPTION OF THE DRAWINGS 

prises a second input port mechanism through which ses- , Q me accompanying drawingSj the preferred embodiment 

sions are received from the network. The second input port 35 of the invent i 0 n and preferred methods of practicing the 

mechanism is connected to the first output port mechanism. i nvent j on m illustrated in which: 

The second input port mechanism has a second guaranteed . . , . . c . , r . 

t 4 . r 11 . 1 . , ! FIG, la is a schematic representation or a switch or the 

session rate, the sum 01 all guaranteed session rates are less . r 

than or equal to the total session rate. The switch comprises present invention. 

a first scheduler connected to the first and second input port 40 FIG - lb U a schematic representation of a switch of the 

mechanisms and to the first output port mechanism for present invention. 

scheduling sessions of the input port mechanisms for ser- FIG. 2 is a schematic representation of a hierarchical 

vice. The switch comprises a server for providing service to distribution regarding the switch. 

sessions of the input port mechanisms. The server is con- FIG. 3 is a schematic representation of credit-flow in the 

nected to the first and second input port mechanisms and the 45 switch, 
first output port mechanism. 

The present invention pertains to a method for switching 
sessions having cells. The method comprises the steps of Referring now to the drawings wherein like reference 
receiving a first session having cells at a first input port numerals refer to similar or identical parts throughout the 
mechanism of a switch. Then there is the step of storing the 50 several views, and more specifically to FIGS, la and lb 
first session in a first input queue of the first input port thereof, there is shown a telecommunications switch 10. The 
mechanism. Next there is the step of receiving a second switch 10 comprises a first output port mechanism 12 
session at a second input port mechanism of the switch. Then through which sessions having cells are sent at a total 
there is the step of storing the second session in a second session rate to a network 36. The switch 10 comprises a first 
input queue of the second input port mechanism. Next there 55 input port mechanism 14 through which sessions are 
is the step of providing service from a server to the first received from the network 36, The first input port rhecha- 
session at a first guaranteed session rate. Then there is the nism 14 is connected to the first output port mechanism 12. 
step of transferring cells of the first session to a first output The first input port mechanism 14 has a first guaranteed 
queue of a first output queue mechanism. Next there is the session rate. The switch 10 comprises a second input port 
step of sending the cells of the first session out of the switch 60 mechanism 16 through which sessions are received from the 
to a network with a first output card connected to the first network 36. The second input port mechanism 16 is con- 
output queue and the network. Then there is the step of nected to the first output port mechanism 12. The second 
providing service from the server to the second session at a input port mechanism 16 has a second guaranteed session 
second guaranteed session rate. Next there is the step of rate, the sum of all guaranteed session rates are less than or 
transferring cells of the second session to the first output 65 equal to the total session rate. The switch 10 comprises a first 
queue. Then there is the step of sending the cells of the scheduler 18 connected to the first and second input port 
second session of the switch to the network. mechanisms and to the first output port mechanism 12 for 
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scheduling sessions of the input port mechanisms for ser- session in a second input queue 30 of the second input port 

vice. The switch 10 comprises a server 20 for providing mechanism 16. Next there is the step of providing service 

service to sessions of the input port mechanisms. The server from a server 20 to the first session at a first guaranteed 

20 is connected to the first and second input port mecha- session rate. Then there is the step of transferring cells of the 

nisms and the first output port mechanism 12. 5 first session to a first output queue 34 of a first output queue 

Preferably, the switch 10 includes a flow control mecha- 34 mechanism. Next there is the step of sending the cells of 

nism 22 for ensuring cells are not lost after they are received the first session out of the switch 10 to a network 36 with a 

at an input port mechanism and until they are sent out an first output card 32 connected to the first output queue 34 

output port mechanism. The flow control mechanism 22 is and the network 36. Then there is the step of providing 

connected to the input port mechanisms and the output port *° service from the server 20 to the second session at a second 

mechanism. The switch 10 preferably includes a second guaranteed session rate. Next there is the step of transferring 

output port mechanism 24 connected to the server 20 and the cells of the second session to the first output queue 34. Then 

first and second input port mechanisms. Preferably, the there is the step of sending the cells of the second session of 

switch 10 includes a second scheduler 26 connected to the the switch 1° 10 the network 36. 

first and second input port mechanisms and the second 15 Preferably, the transferring cells of the first session 

output port mechanism 24 for scheduling sessions of the includes the steps of producing a credit by the output port 

input port mechanisms for service from the server 20. mechanism which was transferred a cell from the first 

Each output port mechanism preferably has a virtual time session; and returning the credit to the first input port 

associated with it. Preferably, the server 20 maintains the mechanism 14 which transferred the cell to the first output 

virtual time for each output port mechanism. Each input port 20 P orl mechanism 12. 

mechanism preferably assigns a start time and a service The receiving the first session step preferably includes the 

interval to each cell that arrives at the respective input port step of assigning a start time to the first input port mecha - 

mechanism. Preferably, the start time is the virtual time nism 14 equal to the virtual time when the first session first 

when a cell first requests service from the server 20 from the requests service from the server 20, and a service interval to 

respective input port mechanism and the service interval is 25 the first input port mechanism 14, where the service interval 

the number of the cells that may be read by the server 20 for is the number of cells that may be read by the server 20 for 

every cell the server 20 reads from the respective input port every cell the server 20 reads from the first input port 

mechanism. Each input port mechanism preferably has a mechanism 14. Preferably, after the assigning step there is 

finishing time equal to the start time plus the service interval. the step of determining a finishing time of the first input port 

Preferably, the server 20 provides service to the input port 30 mechanism 14 equal to the starting time and the service 

mechanism having the smallest eligible finishing time. interval. The receiving the second session step preferably 

Each input port mechanism preferably comprises an input includes the step of determining the finishing time of the 

card 28 which receives cells and an input queue 30 in which second input port mechanism 16. 

cells that are received by the input card 28 are stored, said 35 Preferably, after the determining the finishing time of the 

input queue 30 connected to the input card 28 and the server second input port mechanism 16 there is the step of provid- 

20. Preferably, each output port mechanism includes an ing service by the server 20 to the input port mechanism 

output card 32 which sends cells to the network 36 and an having the smallest eligible finishing time for the first output 

output queue 34 in which cells are stored for the output card port mechanism 12 based on a first scheduler 18 associated 

32, said output queue 34 connected to the output card 32 and ^ with the first output port mechanism 12. After the providing 

to the server 20. service step there is preferably the step of providing service 

The server 20 preferably reads a cell from the input queue by the server 20 to the input port mechanism having the 

30 of the first input port mechanism 14 for an output queue smallest eligible finishing time for the second output port 

34 of the first output port mechanism 12, and the server 20 mechanism 24 based on a second scheduler 26 associated 

causes the finish time of the first input port mechanism 14 to 45 with the second output port mechanism. The second sched- 

become the start time of the input queue 30 of the first input *ler 26 is independent and separate from the first scheduler 

port mechanism 14. Preferably, the server 20 compares the 18 ■ 

start time of a cell that arrives at an empty input queue 30 Preferably, after the transferring the first cell step there is 

with the virtual time of the queue the cell is to be transferred the step of changing the start time of the first input port 

to and sets the start time to the virtual time if the start time 50 mechanism 14 to be the finish time of the first input port 

is less than a virtual time, or sets the start time of the input mechanism 14 if additional cells remain in the first input port 

port mechanism to the virtual time of an output port mecha- mechanism 14, The receiving the first session step prefer- 

nism which sends a credit to the input port mechanism. The ably includes the step of receiving a second cell at the first 

server 20 preferably only resets the start time of an input input port mechanism 14 while the first cell is in the first 

queue 30 when a cell is stored in an input queue 30 or read 5S input port mechanism 14 without changing the virtual time, 

out of an input queue 30, or a credit from an output port Preferably, the serving the first session step includes the 

mechanism is received by an input queue 30. Preferably, the steps of receiving the first cell of the first session at the first 

input card 28 may elect to send a cell to any output port mput port mechanism 14, comparing the start time of the 

mechanism to which the input card 28 has a credit. first input port mechanism 14 with virtual time of the output 

The present invention pertains to a method for switching 60 port mechanism which the first cell is to be sent out of, and 

sessions having cells. The method comprises the steps of setting the start time to the virtual time if the start time is less 

receiving a first session having cells at a first input port than virtual time. After the returning the credit step, there is 

mechanism 14 of a switch 10. Then there is the step of preferably the step of updating the start time to virtual time 

storing the first session in a first input queue 30 of the first if the first input port mechanism 14 has no credits, 

input port mechanism 14. Next there is the step of receiving 65 Preferably, after the receiving the first cell at the first input 

a second session at a second input port mechanism 16 of the port mechanism 14, there is the step of choosing an output 

switch 10. Then there is the step of storing the second port mechanism to transfer the first cell to from all the output 
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port mechanisms which have provided credits to the first time first. The start time, service interval and finish time can 

input port mechanism 14. be stored in any memory associated with the input queue 30 

In the operation of the preferred embodiment, a switch 10 or at the input port mechanism or at the server 20. A pointer 

core of an ATM switch in an ATM network 36 such as that mechanism can be used to link the finish time to the cell to 

described in [F. M. Chiussi, Y Xia, and V. P. Kumar, 5 possibly minimize storage usage. 

"Backpressure in Shared-Memory- Based ATM Switches When the a ceU ^ fead from one of the ■ { & 30 

?«S j^i l ^? d - y A"' ' ? P™ 6 ^ 10 ? 5 of IEE * for an output queue 34 of an output port mechanism, the 

INbOCOM yo j, incorporated by reterence nerein, may be fimsh 

time of the input queue 30 is written to the start time 

extended to provide band width guarantees to sessions, such . • . . iU c • u i- r a. « 

w _ ... * • * . ... ri ' . , location so the start time is reset to the finish line of the cell 
as VCs, passing through it by the application 01 two overlaid in ■ ■ c *u -»n ic jj-*- 1 n 

■ ■ • . r . • a . • r-o 10 receiving service from the server 20. If additional cells 

hierarchical tair queuing servers 20. As was shown mil. . : 0 t , . - A c ,. . „ 

. , « ^ ■ 1 • t o , o ™ , «tt- i_ o j remain in the input queue 30 from which the cell receiving 

Anderson, S. Owicki, J. Saxe, and C. 1 hacker. High Speed -in- j *l « • u <• u- u u 

_ . , ' , 1 T ' . fcT 1 ,n • service by the server 20 is read, the finish time which has 

Switch Scheduling for Local Area Networks. J, incorpo- , \ t . . .. . ' . . f n . 

„ & . ac . - been reset to the start time, is the start time for the cell to 

rated by reterence herein, the delay bounds ot a session in . . <m mx. c 

. . , . 1 r • • , , « 7 T c a_ receive service next in the input queue 30. Otherwise, if 

a hierarchical fair queuing server 20 depends on W. 1. 01 the 1S , tl • • ■ *u ■ . *n 

. , . e - . , \ t,7i_ -, iL - ■ 15 there are no additional cells remaining in the input queue 30, 

schedulers forming the scheduling tree. While this is . , , _ ^ . , , • „„. JL^Ja^a k, 

..... T ^ * l- 1 » 1 „ t . . the queue is marked as empty and is thus not considered by 

described in the context of hierarchical resource allocation at ^ „f *l1 ' tu,./ 

, A . . . . , ... Jt any of the schedulers of the output port mechanisms. Thus, 

an output node, these principles can be u t.hzed to enable ^ stan ^ Qf ||w ueg 30 are ^ o[ rese( 

bandwidth guarantees to be made in an input buffered switch on , wf)en a ^ b h icaU stored in respective input 

10 as described herein. See FIG. 2 which shows a hierar- ~ n ' an t . . „ ' mi ' m «™Wo « fv™ 

_ ^ , . t . .20 queue 30, or the input queue 30 receives a credit from an 

chical distribution from an output port mechanism to input _ lt „, lt _ . 7,^,w, „ f,™ t u fl 

,. ... . • • . « 1 output port mechanism which just receives a cell rrom the 

port mechanisms which are having sessions passing through q£ eue 3 q 

th< Som the output card 32 of an output port mechanism, a . At *° c ™*™> ' he i»Put card 28 selects among all 

hierarchy is constructed comprised of all of the input cards ^ * e for 11 , has « 6 f lls usm 8 11,6 P ollc y- 

28 of the input port mechanisms, with their sessions beneath 11,6 start ?u " ? i"^ .f 6 "f 6 ' ? 

them. The input cards 28 need to be allocated a rate that is ! n P u ' card » ' hal P^viously had no eel s enqueued in its 

4l , tU r r , u . e - tU . j. m mput queue 30 are to receive an arrival of a cell, or a credit 

at least the sum of the rates of the sessions passing from that • e * * ^ i 

j^4>**u t j i5 o i *u arnves from an output port mechanism where there were no 

input card 28 to the chosen output card 32. So long as the A t 

input card 28 is served by the output card 32 at least as fast 30 o ulstandin g credits. 

as its sum of guaranteed session rates, the rates of the The flow control mechanism is aware, as is well known in 

sessions may be met. However, the contention at the input *e art, of the service the server and the output port mecha- 

card 28 among output cards 32 can be broken based on the nisms can provide. When an input card 28 is added, the flow 

demands of the users, with essentially any contention break- contro1 mechanism reduces the service to the already present 

ing scheme acceptable. Since an input card 28 may have 35 input cards 28 so service is available for the new input card 

cells simultaneously offered for multiple output cards 32, 28. In this way, cards can be added (or removed^ervice is 

multiplexing can occur when an input card 28 schedules a then increased to the remaining input cards 28). 
cell for each output. Then, the multicast cell is provided, in Because this hierarchy uses a simple credit-flow 

turn, to each output card 32 to which it is to be transferred. mechanism, the switch 10 may be designed in a pipelined 

Output buffering and a small level of speedup is required to 40 fashion. This may be performed by having multiple succes- 

insure high throughput under diverse traffic conditions. sive chips that implement the credit-flow response mecha- 

Mechanisms such as those described in [F. M. Chiussi, Y nism where each mode contains a scheduler for each output 

Xia, and V P. Kumar. "Backpressure in Shared-Memory- port mechanism. Scalability is enabled by latency tolerance. 

Based ATM Switches under Multiplexed Bursty Sources", The cross bar units for each output port mechanism need not 

In Proceedings of IEEE INFOCOM '96], incorporated by a 45 be co-located on a single device, and well known techniques 

reference herein, and others have used a simple credit-flow for deciding fault tolerant cross point systems may be used 

control mechanism 22 internal to a switch core to insure the if those aspects are desired. By having a distributed sched- 

internal system may be lossless. A credit is returned to an uler system, with each output port mechanism having its 

input card 28 when data may be sent forward in the switch own scheduler, it is a matter of only adding an output port 

10 for service by the server 20 and ultimate transfer to a 50 mechanism with a scheduler to the switch 10 to expand it, 

desired output card 32. A similar credit mechanism internal or to contract it. Since each output port mechanism stands 

to the core is used herein for such purposes. See FIG. 3 independent and capable of immediate operation, it only 

which shows how the data is transferred ultimately to an requires recognition by the server 20 that there is another 

output card 32 and the cell sending input queue 30 receives output port mechanism which is to receive service. The 

a credit. 55 server 20 can provide service to each of the output port 

With such a credit-flow control mechanism 22 in place, a mechanisms in a round robin fashion or any other queuing 

crossbar internal to the switch 10 implements the separate fashion so that each output port mechanism, which has 

scheduler for each output card 32. In turn, each input card 28 already determined the cell that is to receive service next, 

is assigned to items, a start time, and a service interval. A can readily provide the next cell for service when that output 

third item, the finish time of the input queue 30 may be 60 P ort mechanisms turn arrives. 

calculated by adding the start time and the service interval. For a more complete discussion of ATM, see, for instance, 

Additionally, the server 20 maintains a separate output "Gigabit Networking" by Craig Partridge, Addison Wesley, 

virtual time, V (t), for each output card 32. The service 1994; "ATM User Network Interface Specification, Version 

interval is the number of cells that may be read by the server 3.0" by the ATM Forum, Prentice Hall, 1994; "Asynchro- 

20 for every cell of the session it reads from the given input 65 nous Transfer Mode Networks: Performance Issues", by 

queue 30. Among all of the input queues 30, the server 20 Raif O. Onvural, Artech House, Inc., Norwood, Mass. 1994, 

provides service to the cell having the smallest eligible finish and "Comparison of Rate-Based Service Disciplines" by 



11/13/2003, EAST Version: 1.4.1 



US 6,345,040 Bl 



10 



10 



15 



25 



Hui Zhang and Srinivasov Keshav, Proceedings of ACM 
SI G CO MM '91, all of which are incorporated by reference. 

Although the invention has been described in detail in the 
foregoing embodiments for the purpose of illustration, it is 
to be understood that such detail is solely for that purpose 
and that variations can be made therein by those skilled in 
the art without departing from the spirit and scope of the 
invention except as it may be described by the following 
claims. 

What is claimed is: 

1. A telecommunications switch comprising; 
a first output port mechanism through which sessions 

having cells are sent at a total session rate to a network; 
a first input port mechanism through which sessions are 
received from the network, said first input port mecha- 
nism connected to the first output port mechanism, said 
first input port mechanism having a first guaranteed 
session rate; 

a second input port mechanism through which sessions 
are received from the network, said second input port ^ 
mechanism connected to the first output port 
mechanism, said second input port mechanism having 
a second guaranteed session rate, the sum of all guar- 
anteed session rates less than or equal to the total 
session rate; 

a first scheduler connected to the first and second input 
port mechanisms and to the first output port mechanism 
for scheduling sessions of the input port mechanisms 
for service; 

a server for providing service to sessions of the input port 30 
mechanisms, said server connected to the first and 
second input port mechanisms and the first output port 
mechanism; and 

a flow control mechanism for ensuring cells are not lost 
after they are received at an input port mechanism and 35 
until they are sent out an output port mechanism, said 
flow control mechanism connected to the input port 
mechanisms and the output port mechanism, said flow 
control mechanism adaptable to changes in input port 
mechanisms and output port mechanisms. 

2. A switch as described in claim 1 including a second 
output port mechanism connected to the server and the first 
and second input port mechanisms. 

3. A switch as described in claim 2 including a second 
scheduler connected to the first and second input port 
mechanisms and the second output port mechanism for 
scheduling sessions of the input port mechanisms for service 
from the server. 

4. A switch as described in claim 3 wherein each output 
port mechanism has a virtual time associated with it. ' 

5. A switch as described in claim 4 wherein the server 
maintains the virtual time for each output port mechanism. 

6. A switch as described in claim 5 wherein each input 
port mechanism assigns a start time and a service interval to 
each cell that arrives at the respective input port mechanism. 

7. A switch as described in claim 6 wherein the start time 
is the virtual time when a cell first requests service from the 
server from the respective input port mechanism and the 
service interval is the number of the cells that may be read 
by the server for every cell the server reads from the 60 
respective input port mechanism. 

8. A switch as described in claim 7 wherein each input 
port mechanism has a finishing time equal to the start time 
plus the service interval. 

9. A switch as described in claim 8 wherein the server 65 
provides service to the input port mechanism having the 
smallest eligible finishing time. 
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10. A switch has described in claim 9 wherein each input 
port mechanism comprises an input card which receives 
cells and an input queue in which cells that are received by 
the input card are stored, said input queue connected to the 
input card and the server. 

11. A switch as described in claim 10 wherein each output 
port mechanism includes an output card which sends cells to 
the network and an output queue in which cells are stored for 
the output card, said output queue connected to the output 
card and to the server. 

12. A switch as described in claim 11 wherein the server 
reads a cell from the input queue of the first input port 
mechanism for an output queue of the first output port 
mechanism, and the server causes the finish time of the first 
input port mechanism to become the start time of the input 
queue of the first input port mechanism. 

13. A switch has described in claim 12 wherein the server 
compares the start time of a cell that arrives at an empty 
input queue with the virtual time of the queue the cell is to 
be transferred to and sets the start time to the virtual time if 
the start time is less than a virtual time, or sets the start time 
of the input port mechanism to the virtual time of an output 
port mechanism which sends a credit to the input port 
mechanism, 

14. A switch as described in claim 13 wherein the server 
only resets the start time of an input queue when a cell is 
stored in an input queue or read out of an input queue, or a 
credit from an output port mechanism is received by an input 
queue. 

15. A switch as described in claim 14 wherein the input 
card may elect to send a cell to any output port mechanism 
to which the input card has a credit. 

16. A method for switching sessions having cells com- 
prising the steps of: 

receiving a first session having cells at a first input port 

mechanism of a switch; 
storing the first session in a first input queue of the first 

input port mechanism; 
receiving a second session at a second input port mecha- 
nism of the switch; 
storing the second session in a second input queue of the 

second input port mechanism; 
providing service from a server to the first session at a first 

guaranteed session rate; 
transferring cells of the first session to a first output queue 

of a first output queue mechanism; 
sending the cells of the first session out of the switch to 

a network with a first output card connected to the first 

output queue and the network; 
providing service from the server to the second session at 

a second guaranteed session rate; 
transferring cells of the second session to the first output 

queue; 

sending the cells of the second session of the switch to the 
network. 

17. A method as described in claim 16 wherein the 
transferring cells of the first session includes the steps of 
producing a credit by the output port mechanism which was 
transferred a cell from the first session; and returning the 
credit to the first input port mechanism which transferred the 
cell to the first output port mechanism. 

18. A method as described in claim 17 wherein the 
receiving the first session step includes the step of assigning 
a start time to the first input port mechanism equal to the 
virtual time when the first session first requests service from 
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the server, and a service interval to the first input port a second cell at the first input port mechanism while the first 

mechanism, where the service interval is the number of cells cell is in the first input port mechanism without changing the 

that may be read by the server for every cell the server reads virtual time, 

from the first input port mechanism. 25. A method as described in claim 24 wherein the serving 

19. A method as described in claim 18 including after the 5 ^ first sesgion indudes ^ of receivi the first 
assigning step there is the step of determining a finishing ^ of ^ ^ sessk)n ^ ^ ^ fa mechanis 
time of the first input port mechanism equal to the starting . , r * • . ■ 
time and the service interval. comparing the start time of the first input port mechanism 

20. A method as described in claim 19 wherein the ^ virtual time of the out P ut P ort mechanism which the 
receiving the second session step includes the step of io first cell is to be sent out of, and setting the start time to the 
determining the finishing time of the second input port virtual time if the start time is less than virtual time, 
mechanism. 26. A method as described in claim 25 including after the 

21. A method as described in claim 20 wherein after the returning the credit step, there is the step of updating the start 
determining the finishing time of the second input port time to virtual time if the first input port mechanism has no 
mechanism there is the step of providing service by the 15 credits. 

server to the input port mechanism having the smallest 2 7. A method as described in claim 26 wherein after the 

eligible finishing time for the first output port mechanism receiving the first cell at the first input port mechanism, there 

based on a first scheduler associated with the first output port ^ thc step of choosing an outp ut port mechanism to transfer 

mechanism. tne g ret ce jj to f rom a jj ^ c output port mechanisms which 

22. A method as described in claim 21 including after the 20 haye ided credits t0 the first i t rt mec hanism. 
providing service step there is the step of providing service 2g A method a& [n daim u wherein the first 

by the server to the input port mechanism having the . , . . c t A ™ J< . , 

J c • u- *■ r iL j * 7 4 input port mechanism is a first ATM input port mechanism, 

smallest eligible finishing time for the second output port , . . . . . ^a™,- 

mechanism based on a second scheduler associated with the the ™ coad m P ul P°rt mechanism is a second ATM input port 

second output port mechanism, said second scheduler inde- 25 me ^ lsm ' ihc f ™ r 15 an ATM and the nelWOrk 15 

pendent and separate from said first scheduler. an A ™ network - 

23. A method as described in claim 22 including after the 29. Aswitch as described in claim 1 wherein the first input 
transferring the first cell step there is the step of changing the P ort mechanism is a first ATM input port mechanism, the 
start time of the first input port mechanism to be the finish second input port mechanism is a second ATM input port 
time of the first input port mechanism if additional cells 30 mechanism, the server is an ATM server, and the network is 
remain in the first input port mechanism. . an ATM network. 

24. A method as described in claim 23 wherein the 

receiving the first session step includes the step of receiving ***** 
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